66 research outputs found

    The Influence of Text Pre-processing on Plagiarism Detection

    Get PDF
    This paper explores the influence of text preprocessing techniques on plagiarism detection. We examine stop-word removal, lemmatization,number replacement, synonymy recognition, and word generalization. We also look into the influence of punctuation and word-order within N-grams. All these techniques are evaluated according to their impact on F1-measure and speed of execution. Our experiments were performed on a Czech corpus of plagiarized documents about politics. At the end of this paper, we propose what we consider to be the best combination of text pre-processing techniques

    PAN@FIRE: Overview of the cross-language !ndian Text re-use detection competition

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-40087-2_6The development of models for automatic detection of text re-use and plagiarism across languages has received increasing attention in recent years. However, the lack of an evaluation framework composed of annotated datasets has caused these efforts to be isolated. In this paper we present the CL!TR 2011 corpus, the first manually created corpus for the analysis of cross-language text re-use between English and Hindi. The corpus was used during the Cross-Language !ndian Text Re-Use Detection Competition. Here we overview the approaches applied the contestants and evaluate their quality when detecting a re-used text together with its source.This research work is partially funded by the WIQ-EI (IRSES grant n. 269180)and ACCURAT (grant n. 248347) projects, and the Seventh Framework Programme (FP7/2007-2013) under grant agreement n. 246016 from the European Union. The first author was partially funded by the CONACyT-Mexico 192021 grant and currently works under the ERCIM “Alain Bensoussan” Fellowship Programme. The research of the second author is in the framework of the VLC/Campus Microcluster on Multimodal Interaction in Intelligent Systems and partially funded by the MICINN research project TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 (plan I+D+i). The research from AU-KBC Centre is supported by the Cross Lingual Information Access (CLIA) Phase II Project.BarrĂłn Cedeño, LA.; Rosso ., P.; Sobha, LD.; Clough ., P.; Stevenson ., M. (2013). PAN@FIRE: Overview of the cross-language !ndian Text re-use detection competition. En Multilingual Information Access in South Asian Languages. Springer Verlag (Germany). 7536:59-70. https://doi.org/10.1007/978-3-642-40087-2_6S59707536Addanki, K., Wu, D.: An Evaluation of MT Alignment Baseline Approaches upon Cross-Lingual Plagiarism Detection. In: FIRE [12]Aggarwal, N., Asooja, K., Buitelaar, P.: Cross Lingual Text Reuse Detection Using Machine Translation & Similarity Measures. In: FIRE [12]Alegria, I., Forcada, M., Sarasola, K. (eds.): Proceedings of the SEPLN 2009 Workshop on Information Retrieval and Information Extraction for Less Resourced Languages. University of the Basque Country, Donostia, Donostia (2009)BarrĂłn-Cedeño, A., Rosso, P., Pinto, D., Juan, A.: On Cross-Lingual Plagiarism Analysis Using a Statistical Model. In: Stein, B., Stamatatos, E., Koppel, M. (eds.) ECAI 2008 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2008), vol. 377, pp. 9–13. CEUR-WS.org, Patras (2008), http://ceur-ws.org/Vol-377Bendersky, M., Croft, W.: Finding Text Reuse on the Web. In: Baeza-Yates, R., Boldi, P., Ribeiro-Neto, B., Cambazoglu, B. (eds.) Proceedings of the Second ACM International Conference on Web Search and Web Data Mining, pp. 262–271. ACM, Barcelona (2009)Ceska, Z., Toman, M., Jezek, K.: Multilingual Plagiarism Detection. In: Proceedings of the 13th International Conference on Artificial Intelligence (ICAI 2008), pp. 83–92. Springer, Varna (2008)Clough, P.: Plagiarism in Natural and Programming Languages: an Overview of Current Tools and Technologies. Research Memoranda: CS-00-05, Department of Computer Science. University of Sheffield, UK (2000)Clough, P.: Old and new challenges in automatic plagiarism detection. National UK Plagiarism Advisory Service (2003), http://ir.shef.ac.uk/cloughie/papers/pasplagiarism.pdfClough, P., Gaizauskas, R.: Corpora and Text Re-Use. In: LĂŒdeling, A., Kytö, M., McEnery, T. (eds.) Handbook of Corpus Linguistics. Handbooks of Linguistics and Communication Science, pp. 1249–1271. Mouton de Gruyter (2009)Clough, P., Stevenson, M.: Developing a Corpus of Plagiarised Examples. Language Resources and Evaluation 45(1), 5–24 (2011)Comas, R., Sureda, J.: Academic Cyberplagiarism: Tracing the Causes to Reach Solutions. In: Comas, R., Sureda, J. (eds.) Academic Cyberplagiarism [online dossier], Digithum. Iss, vol. 10, pp. 1–6. UOC (2008), http://bit.ly/cyberplagiarism_csMajumder, P., Mitra, M., Bhattacharyya, P., Subramaniam, L., Contractor, D., Rosso, P. (eds.): FIRE 2010 and 2011. LNCS, vol. 7536. Springer, Heidelberg (2013)Gale, W., Church, K.: A Program for Aligning Sentences in Bilingual Corpora. Computational Linguistics 19, 75–102 (1993)Ghosh, A., Bhaskar, P., Pal, S., Bandyopadhyay, S.: Rule Based Plagiarism Detection using Information Retrieval. In: Petras, et al. [24]Gupta, P., Singhal, K.: Mapping Hindi-English Text Re-use Document Pairs. In: FIRE [12]Head, A.: How today’s college students use Wikipedia for course-related research. First Monday 15(3) (March 2010), http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2830/2476IEEE: A Plagiarism FAQ (2008), http://bit.ly/ieee_plagiarism (published: 2008; accessed March 3, 2010)Kulathuramaiyer, N., Maurer, H.: Coping With the Copy-Paste-Syndrome. In: Proceedings of World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2007 (E-Learn 2007), pp. 1072–1079. AACE, Quebec City (2007)Lee, C., Wu, C., Yang, H.: A Platform Framework for Cross-lingual Text Relatedness Evaluation and Plagiarism Detection. In: Proceedings of the 3rd International Conference on Innovative Computing Information (ICICIC 2008). IEEE Computer Society (2008)MartĂ­nez, I.: Wikipedia Usage by Mexican Students. The Constant Usage of Copy and Paste. In: Wikimania 2009, Buenos Aires, Argentina (2009), http://wikimania2009.wikimedia.orgMaurer, H., Kappe, F., Zaka, B.: Plagiarism - a survey. Journal of Universal Computer Science 12(8), 1050–1084 (2006)Palkovskii, Y., Belov, A.: Exploring Cross Lingual Plagiarism Detection in Hindi-English with n-gram Fingerprinting and VSM based Similarity Detection. In: FIRE [12]Palkovskii, Y., Belov, A., Muzika, I.: Using WordNet-based Semantic Similarity Measurement in External Plagiarism Detection - Notebook for PAN at CLEF 2011. In: Petras, et al. [24]Petras, V., Forner, P., Clough, P. (eds.): Notebook Papers of CLEF 2011 LABs and Workshops, Amsterdam, The Netherlands (September 2011)Potthast, M., Stein, B., Eiselt, A., BarrĂłn-Cedeño, A., Rosso, P.: Overview of the 1st international competition on plagiarism detection. In: Stein, B., Rosso, P., Stamatatos, E., Koppel, M., Agirre, E. (eds.) SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2009), vol. 502, pp. 1–9. CEUR-WS.org, San Sebastian (2009), http://ceur-ws.org/Vol-502Potthast, M., BarrĂłn-Cedeño, A., Stein, B., Rosso, P.: Cross-Language Plagiarism Detection. Language Resources and Evaluation (LRE), Special Issue on Plagiarism and Authorship Analysis 45(1), 1–18 (2011)Potthast, M., Eiselt, A., BarrĂłn-Cedeño, A., Stein, B., Rosso, P.: Overview of the 3rd International Competition on Plagiarism Detection. In: Petras, et al. [24]Potthast, M., Stein, B., BarrĂłn-Cedeño, A., Rosso, P.: An Evaluation Framework for Plagiarism Detection. In: Huang, C.R., Jurafsky, D. (eds.) Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), pp. 997–1005. COLING 2010 Organizing Committee, Beijing (2010)Potthast, M., BarrĂłn-Cedeño, A., Eiselt, A., Stein, B., Rosso, P.: Overview of the 2nd International Competition on Plagiarism Detection. In: Braschler, M., Harman, D. (eds.) Notebook Papers of CLEF 2010 LABs and Workshops, Padua, Italy (September 2010)Rambhoopal, K., Varma, V.: Cross-Lingual Text Reuse Detection Based On Keyphrase Extraction and Similarity Measures. In: FIRE [12]Weber, S.: Das Google-Copy-Paste-Syndrom. Wie Netzplagiate Ausbildung und Wissen gefahrden. Telepolis (2007

    A model for transition of 5 '-nuclease domain of DNA polymerase I from inert to active modes

    Get PDF
    Bacteria contain DNA polymerase I (PolI), a single polypeptide chain consisting of similar to 930 residues, possessing DNA-dependent DNA polymerase, 3'-5' proofreading and 5'-3' exonuclease (also known as flap endonuclease) activities. PolI is particularly important in the processing of Okazaki fragments generated during lagging strand replication and must ultimately produce a double-stranded substrate with a nick suitable for DNA ligase to seal. PolI's activities must be highly coordinated both temporally and spatially otherwise uncontrolled 5'-nuclease activity could attack a nick and produce extended gaps leading to potentially lethal double-strand breaks. To investigate the mechanism of how PolI efficiently produces these nicks, we present theoretical studies on the dynamics of two possible scenarios or models. In one the flap DNA substrate can transit from the polymerase active site to the 5'-nuclease active site, with the relative position of the two active sites being kept fixed; while the other is that the 5'-nuclease domain can transit from the inactive mode, with the 5'-nuclease active site distant from the cleavage site on the DNA substrate, to the active mode, where the active site and substrate cleavage site are juxtaposed. The theoretical results based on the former scenario are inconsistent with the available experimental data that indicated that the majority of 5'-nucleolytic processing events are carried out by the same PolI molecule that has just extended the upstream primer terminus. By contrast, the theoretical results on the latter model, which is constructed based on available structural studies, are consistent with the experimental data. We thus conclude that the latter model rather than the former one is reasonable to describe the cooperation of the PolI's polymerase and 5'-3' exonuclease activities. Moreover, predicted results for the latter model are presented

    Pooling and expanding registries of familial hypercholesterolaemia to assess gaps in care and improve disease management and outcomes : Rationale and design of the global EAS Familial Hypercholesterolaemia Studies Collaboration

    Get PDF
    Background: The potential for global collaborations to better inform public health policy regarding major non-hypercholesterolaemia (FH), a common genetic disorder associated with premature cardiovascular disease, is yet to be reliably ascertained using similar approaches. The European Atherosclerosis Society FH Studies Collaboration (EAS FHSC) is a new initiative of international stakeholders which will help establish a global FH registry to generate large-scale, robust data on the burden of FH worldwide. Methods: The EAS FHSC will maximise the potential exploitation of currently available and future FH data (retrospective and prospective) by bringing together regional/national/international data sources with access to individuals with a clinical and/or genetic diagnosis of heterozygous or homozygous FH. A novel bespoke electronic platform and FH Data Warehouse will be developed to allow secure data sharing, validation, cleaning, pooling, harmonisation and analysis irrespective of the source or format. Standard statistical procedures will allow us to investigate cross-sectional associations, patterns of real-world practice, trends over time, and analyse risk and outcomes (e.g. cardiovascular outcomes, all-cause death), accounting for potential confounders and subgroup effects. Conclusions: The EAS FHSC represents an excellent opportunity to integrate individual efforts across the world to tackle the global burden of FH. The information garnered from the registry will help reduce gaps in knowledge, inform best practices, assist in clinical trials design, support clinical guidelines and policies development, and ultimately improve the care of FH patients. (C) 2016 Elsevier Ireland Ltd.Peer reviewe

    The wonders of flap endonucleases: structure, function, mechanism and regulation.

    Get PDF
    Processing of Okazaki fragments to complete lagging strand DNA synthesis requires coordination among several proteins. RNA primers and DNA synthesised by DNA polymerase α are displaced by DNA polymerase Ύ to create bifurcated nucleic acid structures known as 5'-flaps. These 5'-flaps are removed by Flap Endonuclease 1 (FEN), a structure-specific nuclease whose divalent metal ion-dependent phosphodiesterase activity cleaves 5'-flaps with exquisite specificity. FENs are paradigms for the 5' nuclease superfamily, whose members perform a wide variety of roles in nucleic acid metabolism using a similar nuclease core domain that displays common biochemical properties and structural features. A detailed review of FEN structure is undertaken to show how DNA substrate recognition occurs and how FEN achieves cleavage at a single phosphate diester. A proposed double nucleotide unpairing trap (DoNUT) is discussed with regards to FEN and has relevance to the wider 5' nuclease superfamily. The homotrimeric proliferating cell nuclear antigen protein (PCNA) coordinates the actions of DNA polymerase, FEN and DNA ligase by facilitating the hand-off intermediates between each protein during Okazaki fragment maturation to maximise through-put and minimise consequences of intermediates being released into the wider cellular environment. FEN has numerous partner proteins that modulate and control its action during DNA replication and is also controlled by several post-translational modification events, all acting in concert to maintain precise and appropriate cleavage of Okazaki fragment intermediates during DNA replication

    Efficacy and safety of bempedoic acid for the treatment of hypercholesterolemia: A systematic review and meta-analysis

    Get PDF
    Background Bempedoic acid is a first-in-class lipid-lowering drug recommended by guidelines for the treatment of hypercholesterolemia. Our objective was to estimate its average effect on plasma lipids in humans and its safety profile. Methods and findings We carried out a systematic review and meta-analysis of phase II and III randomized controlled trials on bempedoic acid (PROSPERO: CRD42019129687). PubMed (Medline), Scopus, Google Scholar, and Web of Science databases were searched, with no language restriction, from inception to 5 August 2019. We included 10 RCTs (n = 3,788) comprising 26 arms (active arm [n = 2,460]; control arm [n = 1,328]). Effect sizes for changes in lipids and high-sensitivity C-reactive protein (hsCRP) serum concentration were expressed as mean differences (MDs) and 95% confidence intervals (CIs). For safety analyses, odds ratios (ORs) and 95% CIs were calculated using the Mantel–Haenszel method. Bempedoic acid significantly reduced total cholesterol (MD −14.94%; 95% CI −17.31%, −12.57%; p < 0.001), non-high-density lipoprotein cholesterol (MD −18.17%; 95% CI −21.14%, −15.19%; p < 0.001), low-density lipoprotein cholesterol (MD −22.94%; 95% CI −26.63%, −19.25%; p < 0.001), low-density lipoprotein particle number (MD −20.67%; 95% CI −23.84%, −17.48%; p < 0.001), apolipoprotein B (MD −15.18%; 95% CI −17.41%, −12.95%; p < 0.001), high-density lipoprotein cholesterol (MD −5.83%; 95% CI −6.14%, −5.52%; p < 0.001), high-density lipoprotein particle number (MD −3.21%; 95% CI −6.40%, −0.02%; p = 0.049), and hsCRP (MD −27.03%; 95% CI −31.42%, −22.64%; p < 0.001). Bempedoic acid did not significantly modify triglyceride level (MD −1.51%; 95% CI −3.75%, 0.74%; p = 0.189), verylow-density lipoprotein particle number (MD 3.79%; 95% CI −9.81%, 17.39%; p = 0.585), and apolipoprotein A-1 (MD −1.83%; 95% CI −5.23%, 1.56%; p = 0.290). Treatment with bempedoic acid was positively associated with an increased risk of discontinuation of treatment (OR 1.37; 95% CI 1.06, 1.76; p = 0.015), elevated serum uric acid (OR 3.55; 95% CI 1.03, 12.27; p = 0.045), elevated liver enzymes (OR 4.28; 95% CI 1.34, 13.71; p = 0.014), and elevated creatine kinase (OR 3.79; 95% CI 1.06, 13.51; p = 0.04), though it was strongly associated with a decreased risk of new onset or worsening diabetes (OR 0.59; 95% CI 0.39, 0.90; p = 0.01). The main limitation of this meta-analysis is related to the relatively small number of individuals involved in the studies, which were often short or middle term in length. Conclusions Our results show that bempedoic acid has favorable effects on lipid profile and hsCRP levels and an acceptable safety profile. Further well-designed studies are needed to explore its longer-term safety

    The impact of type of dietary protein, animal versus vegetable, in modifying cardiometabolic risk factors: A position paper from the International Lipid Expert Panel (ILEP)

    Get PDF
    Proteins play a crucial role in metabolism, in maintaining fluid and acid-base balance and antibody synthesis. Dietary proteins are important nutrients and are classified into: 1) animal proteins (meat, fish, poultry, eggs and dairy), and, 2) plant proteins (legumes, nuts and soy). Dietary modification is one of the most important lifestyle changes that has been shown to significantly decrease the risk of cardiovascular (CV) disease (CVD) by attenuating related risk factors. The CVD burden is reduced by optimum diet through replacement of unprocessed meat with low saturated fat, animal proteins and plant proteins. In view of the available evidence, it has become acceptable to emphasize the role of optimum nutrition to maintain arterial and CV health. Such healthy diets are thought to increase satiety, facilitate weight loss, and improve CV risk. Different studies have compared the benefits of omnivorous and vegetarian diets. Animal protein related risk has been suggested to be greater with red or processed meat over and above poultry, fish and nuts, which carry a lower risk for CVD. In contrast, others have shown no association of red meat intake with CVD. The aim of this expert opinion recommendation was to elucidate the different impact of animal vs vegetable protein on modifying cardiometabolic risk factors. Many observational and interventional studies confirmed that increasing protein intake, especially plant-based proteins and certain animal-based proteins (poultry, fish, unprocessed red meat low in saturated fats and low-fat dairy products) have a positive effect in modifying cardiometabolic risk factors. Red meat intake correlates with increased CVD risk, mainly because of its non-protein ingredients (saturated fats). However, the way red meat is cooked and preserved matters. Thus, it is recommended to substitute red meat with poultry or fish in order to lower CVD risk. Specific amino acids have favourable results in modifying major risk factors for CVD, such as hypertension. Apart from meat, other animal-source proteins, like those found in dairy products (especially whey protein) are inversely correlated to hypertension, obesity and insulin resistance

    Impact of nutraceuticals on markers of systemic inflammation: Potential relevance to cardiovascular diseases – A position paper from the International Lipid Expert Panel (ILEP)

    Get PDF
    Inflammation is a marker of arterial disease stemming from cholesterol-dependent to -independent molecular mechanisms. In recent years, the role of inflammation in atherogenesis has been underpinned by pharmacological approaches targeting systemic inflammation that have led to a significant reduction in cardiovascular disease (CVD) risk. Although the use of nutraceuticals to prevent CVD has largely focused on lipid-lowering (e.g, red-yeast rice and omega-3 fatty acids), there is growing interest and need, especially now in the time of coronavirus pandemic, in the use of nutraceuticals to reduce inflammatory markers, and potentially the inflammatory CVD burden, however, there is still not enough evidence to confirm this. Indeed, diet is an important lifestyle determinant of health and can influence both systemic and vascular inflammation, to varying extents, according to the individual nutraceutical constituents. Thus, the aim of this Position Paper is to provide the first attempt at recommendations on the use of nutraceuticals with effective anti-inflammatory properties

    Event-by-event correlations between Λ\Lambda (Λˉ\bar{\Lambda}) hyperon global polarization and handedness with charged hadron azimuthal separation in Au+Au collisions at sNN=27 GeV\sqrt{s_{\text{NN}}} = 27 \text{ GeV} from STAR

    Full text link
    Global polarizations (PP) of Λ\Lambda (Λˉ\bar{\Lambda}) hyperons have been observed in non-central heavy-ion collisions. The strong magnetic field primarily created by the spectator protons in such collisions would split the Λ\Lambda and Λˉ\bar{\Lambda} global polarizations (ΔP=PΛ−PΛˉ<0\Delta P = P_{\Lambda} - P_{\bar{\Lambda}} < 0). Additionally, quantum chromodynamics (QCD) predicts topological charge fluctuations in vacuum, resulting in a chirality imbalance or parity violation in a local domain. This would give rise to an imbalance (Δn=NL−NR⟹NL+NR⟩≠0\Delta n = \frac{N_{\text{L}} - N_{\text{R}}}{\langle N_{\text{L}} + N_{\text{R}} \rangle} \neq 0) between left- and right-handed Λ\Lambda (Λˉ\bar{\Lambda}) as well as a charge separation along the magnetic field, referred to as the chiral magnetic effect (CME). This charge separation can be characterized by the parity-even azimuthal correlator (Δγ\Delta\gamma) and parity-odd azimuthal harmonic observable (Δa1\Delta a_{1}). Measurements of ΔP\Delta P, Δγ\Delta\gamma, and Δa1\Delta a_{1} have not led to definitive conclusions concerning the CME or the magnetic field, and Δn\Delta n has not been measured previously. Correlations among these observables may reveal new insights. This paper reports measurements of correlation between Δn\Delta n and Δa1\Delta a_{1}, which is sensitive to chirality fluctuations, and correlation between ΔP\Delta P and Δγ\Delta\gamma sensitive to magnetic field in Au+Au collisions at 27 GeV. For both measurements, no correlations have been observed beyond statistical fluctuations.Comment: 10 pages, 10 figures; paper from the STAR Collaboratio
    • 

    corecore